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We study Newton type methods for inverse problems described by non- 
linear operator equations F{u) = g in Banach spaces where the Newton 
equations F'^Un] Un+i —Un) = g — F{un) are regularized variationally using a 
general data misfit functional and a convex regularization term. This general- 
izes the well-known iteratively regularized Gauss-Newton method (IRGNM) . 
We prove convergence and convergence rates as the noise level tends to both 
for an a priori stopping rule and for a Lepskii-type a posteriori stopping rule. 
Our analysis includes previous order optimal convergence rate results for the 
IRGNM as special cases. The main focus of this paper is on inverse prob- 
lems with Poisson data where the natural data misfit functional is given by 
the Kullback-Leibler divergence. Two examples of such problems are dis- 
cussed in detail: an inverse obstacle scattering problem with amplitude data 
of the far-field pattern and a phase retrieval problem. The performence of 
the proposed method for these problems is illustrated in numerical examples. 

1 Introduction 

This study has been motivated by applications in photonic imaging, e.g. positron emis- 
sion tomography |47|, deconvolution problems in astronomy and microscopy [Sj, phase 
retrieval problems [29] or semi-blind deconvolution problems, i.e. deconvolution with 
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partially unknown convolution kernel |44j. In these problems, data consist of counts 
of photons which have interacted with the object of interest. The inverse problem of 
recovering the information on the object of interest from such photon counts can be 
formulated as an operator equation 



F{u)=g 



if one introduces an operator F : ^ d X ^ y mapping a mathematical description 
u G S of the object of interest to the photon density g ^ y C. L^(M) on the manifold 
M at which measurements are taken. In this paper we focus on problems where the 
operator F is nonlinear. 

For fundamental physical reasons, photon count data are described by a Poisson pro- 
cess with the exact data 5^ as mean if read-out noise and finite averaging volume of 
detectors is neglected. Ignoring this a priori information often leads to non-competitive 
reconstruction methods. 

To avoid technicalities in this introduction, let us consider a discrete version where the 
exact data vector belongs to [0,oo)'', and 5fj is the expected number of counts of the 
jth detector. Then the observed count data are described by a vector g°^^ G Nq of J 
independent Poisson distributed random variables with mean g'^ . A continuous version 



will be discussed in section 



Since - \nl'[g°^''\g\ 



In 



^obs 



{gf-\) 



^jiSj — 5° In^j] + c with a constant c independent of g (except for the special cases 
specified in eq. ([2|), the negative log-likelihood data misfit functional is given by 



5 (5°"'; 5 



J 



00 



E gj - gf' In 5,] , 5 > and {j : gf' > 0, 5, = 0} 
else. 



(2) 



using the convention OlnO := 0. Setting g°^'^ = g^ and subtracting the minimal value 
Leibler divergence 



attained at g = g^ , we obtain a discrete version of the KuUback- 



J 

E 

00, 



9j -9] -9] In 



9] 



5>0, {j:5j>0,<7, =0} 
else . 



(3) 



Note that both S and KL are convex in their second arguments. 

A standard way to solve perturbed nonlinear operator equations ([T]) is the GauB-Newton 
method. If F' denotes the Gateaux derivative of F, it is given by given by u^+i '■= 



argmin„g23 11-^ {un) + F' (n„; u - Un) - g' 



,obs||2 



As explained above, for data errors with 



a non-Gaussian distribution it is in general not appropriate to use a squared norm 
as data misfit functional. Therefore, we will consider general data misfit functionals 
S : y°^^ X y —7- (—00, 00] where 3^°*^^ is a space of (possibly discrete) observations 9°^^. 
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Since inverse problems are typically ill-posed in the sense that F and its derivatives 
F'{un] •) do not have continuous inverses, regularization has to be used. Therefore, we 
add a proper convex penalty functional IZ : X ^ (—00,00], which should be chosen 
to incorporate a priori knowledge about the unknown solution . This leads to the 
iteratively regularized Newton-type method 

S (g°^';F (un) + F' {un, u - Un)) + anil (n)l (4a) 



Un+i '■= argmm 

which will be analyzed in this paper. The regularization parameters q„ are chosen such 
that 

ao < 1, an \ 0, 1 < < Cdec for all n G N (4b) 

for some constant Cdeci typically a„ = aoCdec with Cdcc = 3/2. 

If 3^ = M'^, F{u) = (-F'j(^t))j=i,...,d, and S is given by ([2]), we obtain the convex mini- 
mization problems 

J 

Un+i := argmin \Fj {un) + F', {un;u - Un) - 

«e<B„ L^tt (5) 

- gf' ln{Fj (un) +F^{un;u- u„))] + a„7^ (n) 

in each Newton step where *B„ := {u G *B | 5 {g°^^; F{u) + F' [un] u — Un)) < 00}. In 
principle, several methods for the solution of ([s]) are available. In particular we mention 



inverse scale space methods [13 38 for linear operator equations and total variation 
penalties TZ. EM-type methods cannot readily be used for the solution of the convex 
minimization problems ^ (or subproblems of the inverse scale space method as in flS] ) 
if F'{un', •) is not positivity preserving as in our examples. A simple algorithm for the 
solution of subproblems of the type ([S]) is discussed in sectionJTj We consider the design 
of more efficient algorithms for minimizing the functionals (pFfor large scale problems 
as an important problem for future research. 

The most common choice of the data misfit functional is S{g;g) = \\g — g\\y with a 
Hilbert space norm || • This can be motivated by the case of (multi-variate) Gaussian 
errors. If the penalty term is also given by a Hilbert space norm TZ (u) = ||m — uo||^, 
Q becomes the iteratively regularized Gauss-Newton method (IRGNM) which is one of 



the most popular methods for solving nonlinear ill-posed operator equations |2|[3|[9 , 32 
If the penalty term ||n — uo||^ is replaced by ||ii — one obtains the Levenberg- 
Marquardt method, which is well-known in optimization and has first been analyzed 
as regularization method in [2l]. Recently, a generalization of the IRGNM to Banach 
spaces has been proposed and analyzed by Kaltenbacher & Hofmann |311. 



As an alternative to ^ we mention Tikhonov-type or variational regularization methods 
of the form 

s(g°'^';F{u))+an{u)] . (6) 



Ua ■= argmm 
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Here a > is a regularization parameter. For nonlinear operators this is in general a 
non-convex optimization problem even if S (^g°^'^; •) and TZ are convex. Hence, ([6]) may 
have many local minima and it cannot be guaranteed that the global minimum can be 
found numerically. Let us summarize some recent convergence results on this method: 
Bardsley [4] shows stability and convergence for linear operators and S = KL. Benning 
& Burger U\ prove rates of convergence for linear operators under the special source 
condition F*uj E dTl{u^). Generalizations to nonlinear operators and general variational 



source conditions were published simultaneously by Bot & Hofmann 12 , Flemming 17 
and Grasmair \20]. 



Given some rule to choose the stopping index our main results (Theorems 2.3 and 



4.2) establish rates of convergence of the method Q, i.e. uniform estimates of the error 



of the final iterate in terms of some data noise level err 

< Cip(err) (7) 



for some increasing, continuous function (p : [0, oo) — )• [0, oo) satisfying ip{0) = 0. For the 
classical deterministic error model ||(7°^'^ — 5|| < S and S [g°^^;g) = \\g — g°^^\\^ with some 
r > 1 we have err = 6'^ . In this case we recover most of the known convergence results 
on the IRGNM for weak source conditions. Our main results imply error estimates for 
Poisson data provided a concentration inequality holds true. In this case err = -X= where 
t can be interpreted as an exposure time proportional to the expected total number of 
photons, and an estimate of the form ([T]) holds true with the right hand side replaced 
by an expected error. 

As opposed to a Hilbert or Banach space setting our data misfit functional S does not 
necessarily fulfill a triangle inequality. Therefore, it is necessary to use more general 
formulations of the noise level and the tangential cone condition, which controls the 
degree of nonlinearity of the operator F. Both coincide with the usual assumptions if S 
is given by a norm. Our analysis uses variational methods rather than methods based 
on spectral theory, which have recently been studied in the context of inverse problems 



by a number of authors (see, e.g., 14,25 ,31,41,43] 



The plan of this paper is as follows: In the following section we formulate our first main 



convergence theorem (Theorem 2.3) and discuss its assumptions. The proof will be given 
in section [3j In the following section [4] we discuss the case of additive variational in- 
equalities and state a convergence rates result for a Lepskii-type stopping rule (Theorem 
4.2). In sectionjsjwe compare our result to previous results on the iteratively regularized 
Gauss-Newton method. Section [6] is devoted to the special case of Poisson data, which 
has been our main motivation. We conclude our paper with numerical results for an 
inverse obstacle scattering problem and a phase retrieval problem in optics in section [7j 

2 Assumptions and convergence theorem with a priori stopping 
rule 

Throughout the paper we assume the following mapping and differentiability properties 
of the forward operator F: 
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Assumption 1 (Assumptions on F and TZ): Let X and y be Banach spaces and let 
^ C X a convex subset. 

Assume that the forward operator F : ^ ^ y and the penalty functional TZ : X ^ 
(—00, 00] have the following properties: 

1. F is injective. 

2. F : !B — )• 3^ is continuous, the first variations F'(u; v — u) := lim^x^o ji^i'^ + t{v — 
u)) — F{u)) exist for all u, w E 55, and h 1— )• F'{u] h) can be extended to a bounded 
linear operator F'[u] G L{X,y) for all u G *B. 

3. TZ is proper and convex. 

At interior points G 5S the second assumption amounts to Gateaux differentiability of 
F. 

To motivate our assumptions on the data misfit functional, let us consider the case that 
^obs _ p(^i^^^ _|_ g^j^d ^ is Gaussian white noise on the Hilbert space y, i.e. (^,5) ~ 
iV(0, \\gf) and B{^,g) {^,g) = {g,g) for all g,g e y. U y = M"^, then the negative 
log- likelihood functional is given by S [g'^^'^',g) = \\g — 9°^'^\\2- However, in an infinite 
dimensional Hilbert space 3^ we have ||5'°^'^||3; = 00 almost surely, and S {g°^^:, •) = 00 
is obviously not a useful data misfit term. Therefore, one formally subtracts 
(which is independent of g) to obtain S [g°^^;g) ■= \\g\\y — 2 (^g°^^,g'jy. For exact data 

g^ we can of course use the data misfit functional T [g^;g) = \\g — g^ly- As opposed to 
S, the functional T is nonnegative and does indeed describe the size of the error in the 
data space y. It will play an important role in our analysis. 

It may seem cumbersome to work with two different types data misfit functionals S 
and T, and a straightforward idea to fix the free additive constant in S is to introduce 
<S{g°^^;g) ■= S[g°^^;g) —5 with 5 := infgS [g°^^;g). Then we obtain indeed that 
'5 {g^',g) = T {g^;g). However, the expected error E|5 {g°^^;g) — s — T {g^;g) f is not 
minimized for 5 = 5, but for s = E5 [g'^^^;g) — T{g^',g) = —\\g^\'^. Note that s de- 
pends on the unknown g^ , but this does not matter since the value of s does not affect 
the numerical algorithms. For this choice of s the error has the convenient representa- 
tion S {g°^'';g) + - T{g^;g) = -2{^,g)y. Bounds on sup^^y \{^,g)y\ with high 
probabilities for certain subsets 3^ C 3^ (concentration inequalities) have been studied 
intensively in probability theory (see e.g. [35]). Such results can be used in case of Gaus- 
sian errors to show that the following deterministic error assumption holds true with 
high probability and uniform bounds on err{g) for g £ y. 

Assumption 2 (data errors, properties of S and T): Let G *B C be the exact 
solution and denote by g"^ := F (n"!") G y the exact data. Let 3'°'^^ be a set containing 
all possible observations and g°^^ G 3'°'^^ the observed data. Assume that: 

1. The fidelity term T : F {^) x y ^ [0, 00] with respect to exact data fulfills 
T{g^;g^)=0. 
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2. T and the fidelity term S : y°^^ x 3^ — )• (—00,00] with respect to noisy data are 
connected as follows: There exists a constant Ccrr ^ 1 and functionals err : y — >■ 
[0, 00] and s : F (*B) — )• (—00, 00) such that 



5(<7°*^^;<7) -s(5^) < C,„r ( g^;g)+C,„ err (g) 



r(g^;g] < C, 



+ Ccrr err {g) (8b) 



for all g ^y. 
Example 2.1. 

y, 



1. Additive deterministic errors in Banach spaces. Assume that y 



•obs 



||c,obs _ ^t|| < and s {g2] gi) = T {g2; gi) = \\gi - g2\\y 
withr G [1,00). Then it follows from the simple inequalities {a + bY < 2"^^^ [a^ + b^) 



and \a — b\ + 6'" > 2 ^'a^ that ([8j) holds true with err 

nr— 1 

err — 



„obs 



5 = and 



2. For randomly perturbed data a general recipe for the choice of S,T and 5 is to 
define S as the log-likelihood functional, s(g^) := E^t^ {g°^^;g^) and T {g'^;g) := 
'EgfS(yg°^^;g^—5{g'^). Then we always have T {g'^ ; g'^^ =0, but part 2. of Assump- 
tion has to be verified case by case. 

3. Poisson data. For discrete Poisson data we have already seen in the introduction 
that the general recipe of the previous point yields S given by T = KL and 

It is easy to see that KL (5^^; 5) > for all 5^ and 

r = 1 and 



g] In ig] 



5(5^) = E/=i [9, 

g. Then (jsl) holds true with Ce 



err{g) 



obs 



00, 



g>o,{j ■9j = 0,g]+gf'>0} 



else . 



Obviously, it will be necessary to show that err (g) is finite and even small in some 
sense for all g for which the inequalities ([s]) are applied (see section^. 

To simplify our notation we will assume in the following analysis that s = or equiva- 
lently replace S [g°^^] g) by S [g°^^; g) — 5{g^). As already mentioned in the motivation 
of Assumption^ it is not relevant that 5{g^) is unknown since the value of this additive 



constant does not influence the iterates n„ in (4a). 



Typically S and 7~ will be convex in their second arguments, but we do not need this 
property in our analysis. However, without convexity it is not clear if the numerical 
solution of (4a) is easier than the numerical solution of ([6]). 



Assumption 3 (Existence): For any n G N the problem (4a) has a solution. 
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Remark 2.2. By standard arguments the following properties are sufficient to ensure 



existence of a solution to (4a) for convex S (^g°^^; (see '17 , 25, 40^): 



There are possibly weaker topologies tx, Ty on X,y respectively such that 

1. ^ is sequentially closed w.r.t. tx, 

2. F' (u; •) is sequentially continuous w.r.t. tx and Ty for all n G !B, 

3. the penalty functional IZ : X ^ (—00,00] is sequentially lower semi- continuous 
with respect to tx, 

4- the sets Mfi{c) := |u G A" | TZ{u) < c| are sequentially pre-compact with respect 
to Tx for all c gM. and 

5. for each g°^^ the data misfit term S [g°^^; •) : 3^ — ?• (—00,00] is sequentially lower 
semi- continuous w.r.t. Ty. 



Note that for our analysis we do not require that the solution to (4a) is unique or depends 
continuously on the data g"^'^ even though these properties are desirable for other reasons. 
Obviously, uniqueness is given if S is convex and TZ is strictly convex, and there are 
reasonable assumptions on S which guarantee continuous dependence, cf. [40^ - 

All known convergence rate results for nonlinear ill-posed problems under weak source 
conditions assume some condition restricting the degree of nonlinearity of the operator 
F. Here we use a generalization of the tangential cone condition which was introduced 
in 22 and is frequently used for the analysis of regularization methods for nonlinear 
inverse problems. It must be said, however, that for many problems it is very difficult 
to show that this condition is satisfied (or not satisfied). Since S does not necessarily 
fulfill a triangle inequality we have to use a generalized formulation of the tangential 
cone condition, which follows from the standard formulation if S is given by the power 
of a norm (cf. Lemma |5.2| ) . 

Assumption 4 (Generalized tangential cone condition): 

(A) There exist constants rj (later assumed to be sufficiently small) and Ctc > 1 such 
that for all 5°'^" G 3^°*^" 

^-S(g°'^';F{v))-riS ( g''''' ; F {u 



Ctc 

<s(^g''^'';F{u)+F'{u;v-u)^ (9a) 
<CtcS (g°'''; F (v)) + t]S (5°^^ F (u)) for ah u,vG<3. 
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(B) There exist constants rj (later assumed to be sufficiently small) and Ctc > 1 such 
that 



1 



tc 



r(g^;F{v))-riT(g^;F{u 



<T [g^]F{u)+F' {u]v-u) 
<CtcT(g^;F{v))+r^T(g^;F{u: 



(9b) 



for all n, f G !B. 



This condition ensures that the nonlinearity of F fits together with the data misfit 
functionals S or T. Obviously, it is fulfilled with ry = and Ctc = 1 if F is linear. 

It is well-known that for ill-posed problems rates of convergence can only be obtained 



under an additional "smoothness condition" on the solution (see 16, Prop. 3.11]). In a 
Hilbert space setting such conditions are usually formulated as source conditions in the 
form 



,t 



Uq 



LP F' 



,t 



F' 



u 



UJ 



(10) 



for some u ^ X where if : [0, oo) — t- [0, oo) is a so-called index function, i.e. ip is 
continuous and monotonically increasing with (/?(0) = 0. Such general source conditions 



were systematically studied in [24 , 37 . The most common choices of Lp are discussed in 
section [5l 

To formulate similar source conditions in Banach spaces, we first have to introduce 
Bregman distances, which will also be used to measure the error of our approximate 



^||n — no 
v) is given by 



solutions (see 14): Let u* E dTZ{u^) be a subgradient (e.g. 



Uq if TZ{u) 



2 with a Hilbert norm || • ||). Then the Bregman distance of TZ between u and 



V^{u,u^ 



n(u)-n{u 



u ,u — w 



^\\u — tiolPi we have {u,u^) 



iik-nt||2. 



If is a Hilbert space and TZ{u) — 

Moreover, if is a g-convex Banach space (1 < g < 2) and lZ{u) = Unll*^, then there 
exists a constant Cbd > such that 



u — w 



ill) 



for aXi u ^ X (see e.g. [lO]). In those cases, convergence rates w.r.t. the Bregman 
distance also imply rates w.r.t. the Banach space norm. 



Now we can formulate the following variational formulation of the source condition (10), 
which is a slight variation of the one proposed in |31|: 



Assumption 5 A (Multiplicative variational source condition): There exists u* E 
dlZ (n"f) C X' , /? > and a concave index function ip : (0, oo) — )• (0, oo) such that 



u*,u^ -u) < I3V^ (n,nt^' 



'T{g^;F{u)) 
(n,nt) 



for all n E *B. 



fl2) 
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Moreover, we assume that 



t ^ 



is monotonically decreasing. 



(13) 



As noted in [31] using Jensen's inequality, a Hilbert space source condition (10) for which 
(99^) ^ is convex imphes the variational inequahty 



u ,u — w 



u — w 




(14) 



The tangential cone condition now shows that an inequahty of type ( 12 ) is vahd and 



hence, in a Hilbert space setup Assumption 5 is weaker than ( 10 ) at least for linear 



operators. As opposed to [31] we have omitted absolute values on the left hand side of 



(12) since they are not needed in the proofs, and this form may allow for better index 
functions ip if is on the boundary of 



In many recent publications |12||17[[26l|43 variational source conditions in additive rather 
than multiplicative form have been used. Such conditions will be discussed in section |4j 
Since we use a source condition with a general index function ip, we need to restrict the 
nonlinearity of F with the help of a tangential cone condition. Nevertheless, we want 
to mention that for (p{t) = t^/^ in (12) our convergence analysis also works under a 



generalized Lipschitz assumption, but this lies beyond the aims of this paper. The cases 
(f (t) = with v > \ where similar results are expected are not covered by Assumption [sj 

since for the motivation in the Hilbert space setup we needed to assume that (99^) ^ is 
convex, which is not the case for > ^. 

In our convergence analysis we will use the following two functions, which are both index 
functions as well as their inverses: 



G(t) := 
^{t) :- 



tip^ (t) , 

VW) 



Viif (t) 



(15) 



We are now in a position to formulate our convergence result with a priori stopping rule: 

Theorem 2.3. Let Assumption\l^ or[^]S an(i[^ hold true, and suppose that 

T], Vj^ (licu"!") and T ((7^; F (uq)) are sufficiently small. Then the iterates Un defined by 
(|4| with exact data g°^^ = fulfill 

(un,u^) = O {p^an)) , (16a) 
r(<7^;F(n„)) =0(e(a„)) (16b) 

as n 00. For noisy data define 

err„ := — err {F (un+i)) + 2r]Ctc err {F (u„)) + CtcCerr err (gA (17a) 
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in case of Assumption^^ or 

err„ := err (F + F' u„+i - n„)) ^^^^^ 

+Ccrr err (F (Un) + F' [Un, U'^ - Un)) 

under Assumption\^, and choose the stopping index by 

:= mill {n G N I (a„) < r err„} (18) 



with a sufficiently large parameter r > 1. Then (16) holds for n < n^: and the following 
convergence rates are valid: 

{nn,,u^) = O [ip^ (e-i (err^J)) , (19a) 

r f 5^ (nnj) = O (err„ J . (19b) 



3 Proof of Theorem 12.3 



We will split the proof into to two main parts. For brevity we will denote 

dn:=V^ (un,u^)\ (20) 
Sn:=T(g'';F{un)) ■ (21) 
Let us now start with the following 



Lemma 3.1. Let the assumptions of Theorem \2.3\ hold true. Then we have a recursive 
error estimate of the form 

Oindn+i + \, Sn+1 < f] ( Ceir + 77— ) Sn + anPdn+if ( -^^] + err„ (22a) 

in the case o/[^]S and 

+ 7r~F^ — Sn+i < '^rjCcrvSn + ctnPdn+nf I -j^ — ) + err„ (22b) 

L^tcL^err \a^,i 



the case o/[7[4 for all n € N. 
Proof. Due to ( |12[ ) we have 

7^(^^„+l) - n (^u^^ = T>^ (^Un+i,u^^ - (u*,u^ - Un+i^ 

> dl+, - Pdn+w . (23) 
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Prom the minimality condition (4a) with u = we obtain 

an {n {un+i) - n (ut)) + S (5°^^ F (n„) + F' - n„) 



(24) 



and putting ( 23 ) and \2A\ together we find that 



andl+i + S (^5°^*^; F + F' (u„; n„+i - 

<5 (5°'''; {Un) + F' (un\ -Un))+ Un^dn+W 



Sn+1 



(25) 



In the case of|4^ we use ([S]), which yields 
+ y^T(g^;F{un) + F' (n 



Sn+1 
"n+1 



+ err^ 



and ( 9b ) with v = u\ u = Un leads to 

+ 7^7- <7^;F(n„) + F' (n 



Sn+l 
"n+l 



+ err„. 



By (9b) with v = Un+i, u = Un obtain (22a) 



• In the case of|4k we are able to apply (l9a|) with v = u\ u = Un and (pJal) with 



V = Un+i and u = m„ to ( 25 ) to conclude 
1 

Ctc 



andl_,, + —S(g°''';F{un+i) 



<2rjS F (un)) + CtcS (5°^^ F (nt)) + an^d^+W (^) • 

Due to ^ and Assumption [2] 2 this yields ([22b]). 



Before we deduce the convergence rates from the recursive error estimates (22) respec- 



tively, we note some inequalities for the index functions defined in ( 15 ) and their inverses: 
Remark 3.2. 1. We have 

(f ("i?"^ (Ct)) < max 
(e-i (Ct)) < max 1} ^2 (e-i (t)) (27) 

for all t > and C > if defined, where each inequality follows from two applica- 



tions of the monotonicity assumption (13) (see [31; Remark 2]) 
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2. Since (p is concave, we have 

if (At) < A(/3 (t) for all t sufficiently small and A > 1 



3. (28) implies the following inequality for all t sufficiently small and A > 1; 

e (Xt) < A^G (t) 



(28) 



(29) 



The following induction proof follows along the lines of a similar argument in the proof 
of 31 , Theorem 1]: 



Lemma 3.3. Let the assumptions of Theorem 2.3 hold. Then an estimate of the kind 
(22a) implies 



dn<Ciip (a„) , 
Sn < C2Q ian) 



(30) 
(31) 



for all n < n^: in case of noisy data and for all n £ N in case of exact data where (due 
to rj sufficiently small) 



Co = max < 



4/3^ (CtcCerrC'dec) 



2CtcCerrCdcc 



1 - 2C|j,j,CtcCerr^ (Ccrr + ^ 
Ci = max {v^^, V2 (r/Cs (Qrr + 1/Cerr) + l/r)Cdec} . 



Since (22b) is of the same form as (22a) (only the constants differ), (30) and (31) are 
(with slightly changed constants) also valid under (22b). 



Proof. For n = (30) and (31) are guaranteed by the assumption that do and sq are 
small enough. For the induction step we observe that (22a) together with (18) and the 
induction hypothesis for n < n^, — 1 implies 



-Sn+l < Cy^rQ (On) + anl^dn+lf 



Sn+1 
~ip 



C'tcC'err 

where Cn^r = (Ccrr + l/C^orr) + I/''"- Now we distinguish between two cases: 

<c^,,e(a„). 

1 

-Sn+l < 2Cr,,TQ (an) 



Case 1: Un^dn+i^p ^ 

\ "n + l 

In that case we find 



CtcCerr 



which by 9 (t) /t = Lp^ (t), (28) and (29) implies 



dn+l < \f^C^ip (a„) = ^j2C^(p ( —an+l ) < \/2C^CdecV (On+l) , 

•Sn+l < 2CtcCerrC',j,r0 (cKn) < 2CtcCerrC',)_r C'dec© (o^n+l) • 
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The assertions now follow by y^^C^Cdcc < Ci and 2CtcCerrC'r?,rC'|ec — ^2 which is 
ensured by the definition of C2. 

Case 2: a„/3d„+i(^ > C^,,G (on). 

In that case we find 



-Sn+1 < 2a„/3d„+i(/? 



■Sn+1 
"n+1 



CtcC'err 

If = 0, then this implies Sn+i = and hence the assertion is trivial. By multiplying 
with ^Sn+i and dividing by we have 

1 Sn+l 



L^tc^err '^n-i-l 



Sn+l 



(32) 



Considering only the first term on the left hand side of ( 32 ) this is 



1 / ^/Sn+l 



< 



Sn+l 



(33) 



(34) 



2/3 ; - dl^, 

and by considering only the second term on the left hand side of ( 32 ) 

<D (^) < 2/3CteCerran 

\"n+l/ 

where ^ {t) = ^/i/ip{t) = t/'d{t). Plugging (33) into (34) using the monotonicity of <I> 



by (13) we find 



^ CtcCerrCtn- 



Since $ (t)) = (t) /t this shows 
Hence, 

Sn+l < 4/3^6 (CtcCerrOn) 

which by (|29]) and 4^^ (Cdcc^tcCcrr)^ < C2 impUes 

Now from (t) = Viip {t) we find 6^ (^(^ (^^-1 = a/i?"^ and hence by ([33]) 



< 4/3M ( ^ 



/a 



2/3 



i?(an+l^ 



<2/3y^V' {c^n+iY 
< Cfip{an+i)'^ 
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where we used (pel), C2 > 4/3^ due to CdecCtcCcrr > 1 and ^/2^^fC2 < Ci. 



Therefore, we have proven that (30) and (31) hold for all n < n^, (or in case of exact 
data for all n G N). ■ 



With these two lemmas at hand we are able to complete the Proof of Theorem 2.3 
Inserting (18) into (30) and (31) we find using ([27]) 



< Civ?2 {an,) = O (^2 (e-i (err„J)) 



and 



TU;F{un,)] < C2Q{an,) = 0{evrn:, 



4 A LepskiT-type stopping rule and additive source conditions 

In this section we will present a convergence rates result under the following variational 
source condition in additive form: 

Assumption 5B: There exists u* G &R{v)) C A", parameters j3i G [0,1/2), /32 > 
(later assumed to be sufficiently small), and a strictly concave, differentiable index 
function (p satisfying if' {t) 00 as t \ such that 



for all G 5S . 



(35) 



A special case of condition (35), motivated by the benchmark condition u* = F [n^ 



was first introduced in |25j to prove convergence rates of Tikhonov-type regularization 
in Banach spaces (see also |43|). Flemming 17 uses them to prove convergence rates 



for nonlinear Tikhonov regularization ^ with general S and IZ. Bot &: Hofmann 12 
prove convergence rates for general and introduce the use of Young's inequality which 
we will apply in the following. Finally, Hofmann & Yamamoto |26] prove equivalence in 
the Hilbert space case for 97 (t) = ^/t in (10) and (35) (with different cf. 26, Prop. 



4.4]) and almost equivalence for (/? [t) = t'^ with u < ^ in ( |10[ ) (again with different (p in 
(|35[), cf. 26, Prop. 6.6 and Prop. 6.8]) under a suitable nonlinearity condition. 



Latest research results show that a classic Hilbert space source conditions (10), which 



have natural interpretations in a number of important examples, relates to (35) in a 



way that one obtains order optimal rates (see [18] )■ Nevertheless, this can be seen much 
easier for multiplicative variational source conditions (see (14)). 

The additive structure of the variational inequality will facilitate our proof and the result 
will give us the possibility to apply a Lepskii-type stopping rule. We remark that for 
5 7^ in Assumption J2] it is not clear how to formulate an implementable discrepancy 
principle. 



Given (/? in ( 35 ) , we construct the following further index functions as in [12] , which will 



be used in our convergence theorem: 
ff t = 0, 



ift>0, 
ff t = 0, 



(36a) 
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^/'(f) = / ^-1 (s) ds, t>0, (36b) 







A = inf i g concave index function, g (t) > for t > /■ . (36c) 



The definition (36c) ensures that vA is concave, which by (4b) implies 



(A(a„))5 <Cj^,(A(a„_i))^ (37) 
for ah (; > 1 and n E N. S incG for linccir problGins 'sj^ (c^n) I 

is a bound on the 

approximation error (see [12]) and since for Tikhonov regularization the approximation 
error decays at most of the order 0(a„), we expect that 1 1— )■ ■s/^{t)/t is " asymptoticahy 
concave" in the sense that Iimj\^o A(t)t/l^'(t) = 1, so we don't loose anything by replacing 
^{t)/t by A(t). Indeed, it is easy to see that this is the case for logarithmic and Holder 
type source conditions with v < 1, and in the latter case t i— )• yJW(€)Jt itself is concave 
everywhere. 

Lemma 4.1. Let Assumption^ or[^]B and^^ hold true and assume that 

there exists a uniform upper hound err„ < err for the error terms err„ in Theorem \2.3\ 
Then, with the notation (20), the error of the iterates Un defined by ^ for n > 1 can be 
bounded by the sum of an approximation error bound ^g,pp{n), a propagated data noise 
error bound $noi(?T-) cind a nonlinearity error bound <I>nl('^)) 

dl < $ni (n) + ^app (n) + $noi (n) (38) 

where 

$„l (n) :=2?7Cnl — , 
^-app (n) := 2/32A(a„_i), 
^'noi (n) 



a-ii-i 



and Cnl := max{2Cerr) C'err + 1/C'err}- Moreover, if rj and (^2 are sufficiently small, the 
estimate 

^nl {n) < 7nl (^noi (n) + $app (n)) (39) 

holds true with 

7„l := max <^ - — ^S^^^, , ^ ttv > , 1 — 



l-GL7'^app(l) + $„oi(l)i' c-k--/3: 



Proof. Similar to the proof of Lemma 3.1 the assumptions imply the iterative estimate 



1 1 



(l-/3i)<+l + ^ ^ — Sn+i < r] Cerr + 7; — s„ + an(32^ (sn+i) + err 

^tc^err V 
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for all n G N in case of of [4j3 and 



an (1 - /3i) d^+i + ^ Sn+i < 2'qCerrSn + anl32'P (sn+i) + err 

for all n G N in case of ij^. Now Young's inequality ab < (t) dt + Jq (s) ds 
(cf. [23| Thm. 156]) with the index function tp defined in ( 36a| ) applied to the second- 
last term yields 



This shows that 



an (1 - Pi) dl+i + 



CtrCp. 



^2 Sn+1 < ilC^i^Sn + ^2^^ (on) + err (40) 



^ < A (t) this yields 



for all n G N both in case |4]A and in case |4p. Together with 1/(1 — /3i) < 2 and 

dl+^ < 2?7C7nl— + 2/32 A (a„) + 2* 



err 



for all n > which is by definition ( 38 ) . 



From (40) we conclude that 



]_ 

CtcCc: 



1 



1^2 ^ , ^ err 



/32 



P2 



Now multiplying by 2r/CNL/an+i we find 

^nl (n + 2) < 7$nl (ra + 1) + 7^app (n + 1) + 7^noi {n + 1) 



for all ra G N. Now we prove (39) by induction: For n = 1 the assertion is true by 



the definition of 7ni. Now let (39) hold for some n. Then by the inequality above, the 



induction hypothesis, (37), and the monotonicity of <I>noi we find that 



$„l (n + 1) < 7$nl in) + 7$app {n) + 7^noi {n) 

< 7 (1 + 7nl) (^'app (n) + $noi (n)) 

< CL7 (1 + 7nl) (^app (n + 1) + «>noi (n + 1)) . 

The definition of 7ni implies Cjg^7 (1 + 7ni) < 7ni and hence the assertion is shown. 



Lemma 4.1 allows us to apply the Lepskii balancing principle as developed in 5 6 ,36 ,37 



as a posteriori stopping rule. Since the balancing principle requires a metric on X we 



assume that (11) holds true. As already mentioned, this is for example the case if X is 
a g-convex Banach space and TZ{u) = 



Together with (11) and taking the g-th root it follows from Lemma 4.1 that 



1 



'^11 < CLA^nX (n) 



app 



{n)~^ +^>noi {n) 
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Whereas ^app and are typically unknown, it is important to note that the error 
component $noi is known if an error bound err is available. Therefore, the following 
Lepskii balancing principle can be implemented: 



A^-^ax := min |n G N | C^'d^noi (n)^ > l| (41a) 
ribai := min \ne {1,..., iVmax} I Vm > n ||n„ - Um\\ < c^^^j (m) } (41b) 



Moreover, it is important to note that $noi is increasing and <I>app is decreasing. There- 
fore, the general theory developed in the references above can be applied, and we obtain 
the following convergence result: 

Theorem 4.2 (Convergence rates under Assumptionlsh). Let the assumptions of Lemma 
4-1 hold true and assume that T)^ {uq,u'^^ and S [g,F (uq)) are sufficiently small. 



1. exact data: 

Then the iterates (un) defined by Q with exact data g°^^ = fulfill 



C (u„,nt) =O(A(a0), 



2. a priori stopping rule: 

For noisy data and the stopping rule 



n — )• oo. 



(42) 



n^, := min |n E N | iP" (a„) < errj 



with ^ defined in (36b) we obtain the convergence rate 



(un, ,u'^) = O {A{^-^ (err)) ) , err ^ 0. 



(43) 



3. Lepskii-type stopping rule: 



Assume that (11) holds true. Then the Lepskii balancing principle (41b) with 



c = C^A (1 + 7ni) leads to the convergence rate 



O (A (tf^^ (err))) , err 0. 



Proof. By (38) and (39) we find d„ < (1 + 7ni) (^app {n) + $noi ("-)) which implies part 
1 and 



£ < (l + 7ni) 2/32A(a„._i) + 2 



err 



Using the definition of n^, and (37) we have 



err 
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Using the definition of again we obtain < (err). Putting tliese estimates 
together yields (43). 

To prove part 3 assume that err is sufficiently small in the following. We use again 
< (1 + 7ni) (*^app (n) + <I>noi (n)), which yields by (11) the estimate 



Un-U 



< C^^ (1 + 7nl) M ^app in) ^ + $noi (n) 



for aU n G {1, A^max}- Define V (j) := 2C^a (1 + 7nl)^ ^-noi (iVmax + 1 - j) and (j) := 

i 1 

2Cbd (1 + 7ni) " ^app (A^max + 1 - j) and note that </> (1) < V (1) if and only if $app (iVmax) < 
1. This is the case if iVmax is sufficiently large which holds true for sufficiently small err 
as assumed. Thus by (37) we can apply 36, Cor. 1] to gain 

2 1 

<6(l + 7nl)^C]^,C^^ min ($app (n)^ + $noi (n)^) . 

If we can show that G {1, A'max} we obtain the assertion as in part 2. Since by 
definition a„^_i > (err), we have 



^noi (n*) = 2-?^ < 2 "7 < 2A (0^-1 (err)) 
a„,_i <F-^ (err) ^ 



err 



and hence n* < iVmax if err is sufficiently small. 



5 Relation to previous results 

The most commonly used source conditions are Holder-type and logarithmic source 
conditions, which correspond to 

^^t) := (0,1/2], (44a) 

_ f(-ln(t))-^ ifO<t<exp(-p-l), 

10 if t = 0, 

respectively. For a number of inverse problems such source conditions have been shown 
to be equivalent to natural smoothness assumptions on the solution in terms of Sobolev 
space regularity (see (T6l|28]). We have restricted the range of Holder indices to z/ E 



(0,1/2] since for > 1/2 the monotonicity assumption (13) is violated. By computing 



the second derivative, one can easily see that the functions ipp are concave on the interval 



[0,exp(— p — 1)], and condition (13) is trivial. If necessary, the functions ipp can be 
extended to concave functions on [0, oo) by suitable affine linear function on (exp(— p — 
l),oo). 



We note the explicit form of the abstract error estimates ( 19 ) for these classes of source 
conditions as a corollary: 

Corollary 5.1 (Holder and logarithmic source conditions). Suppose the assumptions of 
Theorem \2.3\ hold true. 
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1. If if in (12) is of the form (44a) and n* := min s n G N I a„ < r errj^"*"^" > with 



r > 1 sufficiently large, then 



^tC («n. , ^xt) = O (^errA?^ ) . (45a) 

If ip = (pp, := min{n G N | < rerr„,} and r > 1 sufficiently large, then 

(un„uA = O {p2p (err^J) . (46a) 



Proof. In the case of Holder source conditions we already remarked that the conditions in 
Assumption are satisfied v G (0, 1/2], and we have 6 {t) = t^^"^", e-^(^) = ^V{i+2;.)^ 
In the case of logarithmic source conditions we have @ {t) = t ■ (p2p (t) . The function 
does not have an algebraic representation, but its asymptotic behavior at can 
be computed: 6"^ (t) = (1 + o(l)) as t \ 0. This implies that ipp (9"^ (t)) = 

(^p (t) (1 + o (1)) as t \ 0. Note that the proposed stopping rule n*, which can be 
implemented without knowledge of the smoothness index p, deviates from the stopping 
rule 

I* := min{n G N | a„(^2p(«n) < Terr„} 



proposed in Theorem 2.3 Asymptotically we have > n*, and hence (16) holds for 



n = n^. Therefore, we still get the optimal rates since 

'^n (iin.,un = O {(p2p (ofij) = O {(p2p {y/T err^ij) = O {(p2p (err^^J) . 



Recall from section [2] that we can choose 

err = <5'- if Wg-"-- - g^ \\y < 5 and 5 (52; 5i) = Ibi - 52|IS^, T = 5 
with r G [1, cxd). In particular, if X and y are Hilbert spaces, r = 2 and TZ = \\u — uq 



|2 



for some uq ^ X, then (45a) and (46a) translate into the rates 

||n„, -u\\ = (^5^^ , 

0((-ln<^)-^), 



u\ 



respectively, for 5 — )• (see, e.g., |32j), which are known to be optimal for linear inverse 
problems. 

It remains to discuss the relation of Assumption [4] to the standard tangential cone 
condition: 

Lemma 5.2 (tangential cone condition). Let S (52; 5i) = T {g2'-,gi) = \\gi — g2\\y- If F 
fulfills the tangential cone condition 

\\F{u) + F' {u]v-u)-F{v)\\y<fi\\F{u)-F{v)\\y forallu,v£^ (47) 

with fj > sufficiently small, then Assumptions^^ and[^]B are satisfied. 
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Proof. Using the inequality (a + bY < 2^ (a^ + b'^), a,b > we find that 

\\F{u) +F' (u;v-u) - g\\y 

< (n) + F' {u;v-u)-F {v)\\y + ||F {v) - g\\y 

<r-'f \\F (u) - F (vWy + 2-^ ||F (v) - 

<2'^-^^ \\F (n) -g\\'y + (2-1 + fi^^^'^-') \\F (v) - gfy . 

Moreover, with \a — b\^ > 2^~^'a^ — b^ , a,b > we get 

\\F{u) + F' {u;v - u) - g\\y 



> 



1^ (^^) - aWy - \\F (u) + F' {u;v - u) - F (v) \ 



y 



r 



>2'-r\\F{v)-gry-f\\F{u)-F{v)ry 

>2'-^ \\F (v) - g)ry - 2^-'f \\F {u) - g^y - 2^~'f \\F (v) - gW'y 
= (2^-^ - 2^-'f) \\F (v) - gWy " ^^'W \\F (n) - 
for all g y. Hence, Q holds true with rj = 2'^'^~'^ff and 

Ctc = max I ^- 2'-i + f/^'22-2| > ^ 

if 7) is sufficiently small. ■ 

6 Convergence analysis for Poisson data 

In this section we discuss the application of our results to inverse problems with Poisson 
data. We first describe a natural continuous setting involving Poisson processes (see 
e.g. [l]). The relation to the finite dimensional setting discussed in the introduction is 
described at the end of this section. 

Recall that a Poisson process with intensity g G L"'^(M) on some submanifold M C M*^ 
can be described as a random finite set of points {xi, . . . , xat} c M written as random 
measure G = '}2in=i such that the following conditions are satisfied: 

1. For all measurable subsets M' C M the number G(M') = #{n : Xn G M'} is 
Poisson distributed with mean J^, gdx. 

2. For disjoint measurable subsets M'^, . . . , M'^ C M the random variables G{M[), . .., G(M^ 
are stochastically independent. 

Actually, the first condition can be replaced by the weaker assumption that EG(M') = 
J^,gdx. In photonic imaging g will describe the photon density on the measurement 
manifold M, and xi, . . . ,xn with denote the positions of the detected photons. For a 
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Poisson process G with intensity g and a measurable function -0 : M — >• M the following 
equalities hold true whenever the integrals on the right hand sides exist (see [33]): 



E / VdG 



'05'^ dx . 



Var / dG = j ip'^g^ dx 
M 



(48) 



We also introduce an exposure time t > 0. Our convergence results will describe re- 
construction errors in the limit t — )• oo. Assume the data Gt are drawn from a Poisson 
process with intensity tg^ and define Gt := jGf. The negative log- likelihood functional 
is given by 



SiGt;g) = { 



f gdx - flngdGt = f gdx - i En=ilnfif(x„) , g>0 



M 



DO , 



else. 



(49) 



We set InO := — oo, so S {Gt; g) = oo if g{xn) = for some n = 1, . . . ,N . Using (48) we 



obtain the following formulas for the mean and variance of 5 {Gt\g) if the integrals on 
the right hand side exist: 



E5 {Gt;g) 



g - g'f In g 



dx , Var S {Gf, d) = \ I (In 5)^5^ dx . (50) 



The term 5(5+) = E5 {Gt;g^) = /j^b^ - g^\^g^]dx with OlnO := is finite if c/t € 
L^(M) n L°^(M) and 5^ > as assumed below (see e.g. 46, Lemma 2.2]). Abbreviating 
the set {x G M : g\x) > 0} by {g^ > 0} we set 



g] ■.= KL(g^;g] : 



I 

{gt>o} 



g- a'^ - a 



00 . 



dx , ^7 > 



else. 



(51) 



It can be shown that the integral is well-defined, possibly taking the value -|-oo, i.e. the 
negative part of —g'^\n{g'^/g) is integrable if g,g'^ G L^{M) and g^g'^ > (see e.g. 
Lemma 2.2]). We find that Assumption [2] holds true with Gq„ = 1 and 
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err{g) 



\j^Hg)[dGt-g^dx) \ , 5>0 
, else. 



(52) 



This motivates the following assumption: 

Assumption V'. With the notation of Assumption [T] assume that 

1. M is a compact submanifold of M°', y := ^^(M) n C(M) with norm \\g\\y 
IIs'IIli + llslloo and 

F(u) > for aU u G 55. 
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2. For a subset 3^ C 3^ specified later there exist constants PQ,to > and a strictly 
monotonically decreasing function ( : {po,oo) — )• [0,1] fulfilling limp^oo C(p) = 
such that 



sup 



for all p > po and all t > to. 



Hg) (dGt-gUx 



> 



<C(P) 



(53) 



It remains to discuss the concentration inequality (53). A general result of this type, 



which can be seen as an analog to Talagrand's inequalities for empirical processes, has 
been shown by Reynaud-Bouret [42| Corollary 2]. She proved that for a Poisson process 
G with intensity g G L^(M.) and a countable family of functions {fn}nGN with values 
in [—6, b] the random variable Z := sup„gj^ |/ /„ {dG — ^dx )| satisfy the concentration 
inequality 

P > (1 + e)E(Z) + ^yl2vop + K{e)bp^ < exp{-p) (54) 

for all p, e > with vq := sup„gpj / fnd'^^ ^'^d K(e) = 5/4 + 32/e. We can apply this 
result with G = tGt and g = tg'^ if y is separable and || ln((jr)||oo < b for all g G y . 
Under additional regularity assumptions (e.g. M Lipschitz domain and su p||| ln(g)||jj'a : 
g(^y} <oo with s > dim(M)/2) it can be shown that £(2") < C/^fi (see [48, sec. 4.1]). 
This yields a concentration inequality of the form (53) with C,[p) := exp(— cp) for some 
c> 0. 

An essential restriction of Reynaud-Bouret 's concentration inequality in our context is 
the assumption ||ln((7)||oo < b for all g G y. This does not allow for zeros of F{u) 
even on sets of measure if F{u) is continuous, which is a very restrictive assumption. 
Therefore, we introduce the following shifted version of the Kullback-Leibler divergence 
^ involving an offset parameter o" > and a side-constraint 5 > — |: 



T[g^;g 



oo 



[g^ + a;g + a) if(7>-f 
otherwise. 



(55) 



Note that (51) and (55) coincide for o" = 0. Correspondingly, we choose 



c,^ N //Mb-f^ln((7 + a)] dx-/i^ln(g + cj)dGt if5>-f, 
S{Gt;g):=\ (56) 
oo else 



as data misfit functional in (4a). Setting 5{g'^) := f^[g^ — {g^ + cr) ln{g^ + a)] dx, As- 
sumption [2] is satisfied with 



err (5) : = 



' J^ln{g + a){dGt- gUx) , <?>-§, 



^0 else. 
Remark 6.1 (Assumptions [sjA and [5^ (source conditions)). Using the inequality 

\\gi - g2\\\2 < (^\\gi\\L°° + ^||52||l°° j ikl {g2;gi) 



(57) 
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(see fll\ Lemma 2.2 (a)]), Assumption^^/B with T{gi;g2) = \\gi — 52||^2 imply As- 
sumption^^/B with T{gi;g2) = ^^{gi'ig2) if F{^) is hounded in L°°(M). However, 
Assumptions ^^/B with T{gi;g2) = ^^{gi',g2) 'may be fulfilled with a better index 
function if F{u^) is close to in parts of the domain. 

Before we state our convergence result, we introduce the smallest concave function larger 
than the rate function in Theorem 14.21 



(58) 



(p := inf I (p concave index function, <^ (s) > A (l^ ^ (s)) for s > O} 



Prom the case of Holder-type source conditions we expect that (p will typically coincide 
with A o tp'-i at least in a neighborhood of (see e.g. f26| Prop. 4.3]). 

Corollary 6.2. Let the Assumptions^^and^Bp hold true. Moreover, assume that one 
of the following conditions is satisfied: 



Assumptions^^ and V hold true with S and T given by (49) and (51) and y 
F(5S). 

Assumptions^^ and V hold true with T and S given by (55) and (56) and 
y ■.={F{u) + a: u e 05} 

U ^F{u) + F'{u; V - u) + a : u,v e ^, F{u) + F'{u; v - u) > 



Suppose that (^2 is sufficiently small, *B is bounded and TZ is chosen such that (11) 



holds true, and Lepskii's balancing principle (41) is applied with c = C^A[1 + 7ni) and 



err 



with a sufficiently large parameter t (a lower will be given in the proof). 



Then we obtain the following convergence rate in expectation: 



E 



<Ohp 



t — )• oo. 



(59) 



Proof. In the case of Assumption [4]A and cr = 0, we find that Assumption [2] holds true 
with err defined by (52). Assumption V implies that the terms err„ defined by (17a) 
in Theorem 2.3 satisfy 



rp 

sup err„ < — 



> 1 - C(p) 



(60) 



for all p > pq and t > tQ with r := 1 + 2r/Ctc + Ctc due to Cerr = 1- To show the 
analogous estimate in the case of Assumption [4^, recall that Assumption [2] holds true 
with err defined by (57). Prom the variational characterization of it follows that 



a 



F (Un) + F' {Un] Un+1 - Un) > - - 



(61) 
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Moreover, from Assumption |4p we conclude that 



> 



a 
2 



(62) 



This yields the inequality (60) with r := 2 also for err„ defined by (17b) using Assump- 
tion V. 

By virtue of (60) the sets Ep := |sup„gp^jj err„ < ^| have probability > 1 — ( (p) ii 

p > pq. Recall that C is monotonically decreasing and define p{t) := (l/-v/t) where 
we assume t to be sufficiently large. We have 



E 




<2'? ( max 









''"bal 



(63) 



sup ||n 



'P F 



Now we can apply Theorem 4.2 to obtain the error bound 



max 



< Clip (err) < CiTip 



Vi J 



with some constant Ci > for all sufficiently large t. In the last inequality we have used 
the concavity of ip. Plugging this into (63) yields 



E 



"bal 



< 2« Cirp 



C-niM) ^ 1 

r + —F sup \\u 



Since ip is concave, there exists C2 > such that s < C20 (s) for all sufficiently small 
s > 0. Moreover, 4= in the second term is bounded by —- — V^-* , and thus we obtain 
the assertion (59). ■ 

If C (p) = exp(— cp) for some c > as discussed above, then our convergence rates 
result ( |59| ) means that we have to pay a logarithmic factor for adaptation to unknown 
smoothness by the Lepskii principle. It is known (see fls]) that in some cases such a 
logarithmic factor is inevitable. 

The most important issue is the verification of Assumption V. In case of Assumption 
|4]A this follows from the results discussed above only under the restrictive assumption 
that F(u) is uniformly bounded away from for all u G 53. On the other hand for the 
case of Assumption l4fi we find that Assumption V is satisfied under the mild condition 



sup ll-F(n) + F'{u, V 
n, lie's 



Binning. Let us discuss the relation between the discrete data model discussed in the 
introduction and the continuous model above. Consider a decomposition of the mea- 
surement manifold M into J measurable disjoint subdomains (bins) of positive measure 
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> 0: 



J 

M=[jMj 
i=i 

In practice each Mj may correspond to a detector counting the number of photons in 



so the measured data are 



fff ^ = tGt 



,J, 



Consider the hnear operator Sj : L^(M) — M"^, {Sjg)j := J^^ gdx and the mapping 



S*j9 ■■= EU 



1-1 



g.lM-, which is adjoint to Sj with respect to the L^(M) inner 



product and the inner product {g,h) := T.j=i\^j\'^9-hj- Pj ■= S*jSj is the L^- 
orthogonal projection onto the subspace of functions, which are constant on each M.j. 
Sj can naturally be extended to measures such that {Sj{Gt))j = Gt{M.j) = ]#{n : x„ G 
Mj}. For distinction we denote the right hand sides of eqs. ^ and ^ by Sj and KLj, 
and define 5oo and KLoo by (49) and (51). Then 



Soo ( S*jg ) and ELj { 9^ ; 9) = Kh^o ( S*jg^; S*jg ) . 



The discrete data model above can be treated in the framework of our analysis by 
choosing 

1 



5(5^) := 5j [Sjg"^; Sjg^), and T := KLoo- Then Assumption [2] holds true with 



err{g) :-- 



^ln((5^5),)f^£;'^-(5j<7^). 

Pj9^;Pj9 



(64) 



if Sjg > 0, {j : {Sjg)j = 0, {Sg^)j + g"^^' > 0} = and err{g) := 00 else. To achieve 
convergence, the binning has to be refined as t — t- 00. The binning should be chosen 
such that the second term on the right hand side of (64) (the discretization error) is 
dominated by the first term (the stochastic error) such that the reconstruction error is 
determined by the number of observed photons rather than discretization effects. 



7 applications and computed examples 



Solution of the convex subproblems. We first describe a simple strategy to minimize 



the convex functional (4a) with S as defined in (56) in each Newton step. For the moment 



we neglect the side condition g > —a/2 in (56). For simplicity we further assume that 
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TZ is quadratic, e.g. TZ{u) 
order Taylor expansion 



\u — Uo\ 



We approximate S {g°^'^; g + h) by the second 



SW(9°'-i9](ft) :=5(<,°'»;9) + 



2{g + a) 



dx 



and define an inner iteration 



hn,i := argmin 
h 



5(2) 



^obs 



; F{Un) + F'[Un]{Un,l - Un)] (h) + anU{Un,l + h) 



(65) 



for / = 0,1,... with Unfl := and u„^/_(_i := + Sn,/^n,z- Here the step-length 
parameter Sn,i is chosen as the largest s G [0,1] for which sF'[un] > —rja — F{un) 
with a tuning parameter rj G [0, 1) (typically t] = 0.9). This choice of Sn^i ensures that 
F{un) + F' [un]{un,i+i — Un) > —fl'^: is a reasonable approximation to (4a), and 

7] = 1/2 ensures that Un^i+i satisfies the side condition in (56). It follows from the first 



order optimality conditions, which are necessary and sufficient due to strict convexity 
here, that Un^i = Un^i+i is the exact solution Un+i of (4a) if h^^i = 0. Therefore, we stop 
the inner iteration if ||/in,«|l/ll^",o|| is sufficiently small. We also stop the inner iteration 
if s„ ; is or too small. 



Simplifying and omitting terms independent of h we can write (65) as a least squares 
problem 



hn^i = argmin 
h 




9n,l + O- 



F'[un]h + 



obs 



gn,i - g 



dx 



(66) 



+ anil {Un,l + h) 



(66) is solved by the CG method applied to the 



with gn^i := F{un) + F'[u„](u„,; - u„ 
normal equation. 

In the examples below we observed fast convergence of the inner iteration (65). In 



the phase retrieval problem we had problems with the convergence of the CG iteration 
when On becomes too small. If the offset parameter a becomes too small or if cr = 



convergence deteriorates in general. This is not surprising since the iteration (65) 



cannot be expected to converge to the exact solution Un+i of (4a) if the side condition 
F{un) + F'{un'i Un+i — Un) > —a/2 is active at Un+i- The design of efficient algorithms 
for this case will be addressed in future research. 

An inverse obstacle scattering problem without phase information. The scat- 
tering of polarized, transverse magnetic (TM) time harmonic electromagnetic waves by 
a perfect cylindrical conductor with smooth cross section D C is described by the 
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equations 



Au + k^u = 0, 
du 
dn 



0, 



lim \/r 



0, 



in \ L>, (67a) 
on dD, (67b) 
where r := \x\,Us '■= u — Ui . (67c) 



Here D is compact, \ is connected, n is the outer normal vector on dD, and 
Ui = exp(iA;x • d) is a plane incident wave with direction d G {x G : |x| = 1}. This is 
a classical obstacle scattering problems, and we refer to the monograph [Ts] for further 



details and references. The Sommerfeld radiation condition (67c) implies the asymptotic 
behavior 

, , exp(ifc|x|) / f X \ f 1 



UAX] 



+ 



as |x| — > oo, and Uoo is called the far field pattern or scattering amplitude of Ug. 
We consider the inverse problem to recover the shape of the obstacle D from photon 
counts of the scattered electromagnetic field far away from the obstacle. Since the photon 
density is proportional to the squared absolute value of the electric field, we have no 
immediate access to the phase of the electromagnetic field. Since at large distances the 
photon density is approximately proportional to 
by the operator equation 

F{dD) = \u^ 



Uool , our inverse problem is described 
\\ (68) 



A similar problem is studied with different methods and noise models by Ivanyshyn & 
Kress |30]. Recall that |moo| is invariant under translations of dD. Therefore, it is only 
possible to recover the shape, but not the location of D. For plottings we always shift the 
center of gravity of dD to the origin. We assume that D is star-shaped and represent dD 
by a periodic function q such that dD = {g(t) (cost, sin t)^ : t G [0, 27r]}. For details on 
the implementation of F, its derivative and adjoint we refer to [27] where the mapping 
q I—)- Uoo is considered as forward operator. Even in this situation where the phase of 
Uoo is given in addition to its modulus, it has been shown in [27| that for Sobolev-type 
smoothness assumptions at most logarithmic rates of convergence can be expected. 
As a test example we choose the obstacle shown in Figure [l] described by q\t) = 
^\/3 cos^ t + 1 with two incident waves from "South West" and from "East" with wave 
number A: = 10 as shown in Figure [T} We used J = 200 equidistant bins. The initial 
guess for the Newton iteration is the unit circle described by go = 1, and we choose the 
Sobolev norm TZ (q) = \\q — <?o|Ihs with s = 1.6 as penalty functional. The regularization 
parameters are chosen as a„ = 0.5 • (2/3)". Moreover, we choose an initial offset param- 
eter a = 0.002, which is reduced by | in each iteration step. The inner iteration (65) is 
stopped when ||/in,i||/ll^n,o|| < 0.1, which was usually the case after about 3 iterations 
(or about 5 iterations for ||/in,dl/ll^n,o|| < 0.01). 

For comparison we take the usual IRGNM, i.e. ^ with S {g; g) = \\g — g\\'x^2 and TZ as 
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(a) true obstacle and total field for an in- (b) t|uool^ = tF{q^) for both waves (red 
cident wave from "South West" line) and corresponding count data g°^^ 

(blue crosses) 




-1 







1 




(c) results for 5 as in (561. blue: best, (d) results for S{g2;gi) = \\gi — ff2||^2. 
green: median, black: initial guess blue: best, green: median, black: initial 

guess 



Figure 1: Numerical results for the inverse obstacle scattering problem (68). Panels 



c) and d) show best and median reconstruction from 100 experiments with 
t = 1000 expected counts. See also Table [ij 



above as well as a weighted IRGNM where S is chosen to be Pearson's (/'■^-distance: 

^obs 1 2 



9 - 9 



9' 



obs 



dx. 



(69) 



Since in all our examples we have many zero counts, we actually used 

02 fg°b^max{g,c} 



5 (5°"^; 5 
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N 



E\\qN-q 



|2 
Il2 



^Var| 



„obs II 2 

5 IIl2 



100 



\9 



62 (5;max{5°^^0.2}) 
5 in eq. ([56|) 



„obs||2 

5' IIl2 




1000 



|5 



62 (5; max {5°*^", 0.2}) 
S in eq. (fSGl) 



„obs||2 

9 IIl2 



10000 



15 



62 (5; max {5°^% 0.2}) 
5 in eq. ([56|) 



9 

23 
5 



0.105 
0.076 
0.050 



0.004 
0.048 
0.005 



Table 1: L^-erroi statistics for the inverse obstacle scattering problem (68). The log- 
hkehhood functional (56) is compared to the standard L'^ and Pearson's (p'^ 
distance (cf. (69)) for different values of the expected total number of counts t 
with 100 experiments for each set of parameters. The error of the initial guess 



IS 11^0- 



"9^IIl2 = 0.288. All parameters as in Figure [ij 



with a cutoff-parameter c > 0. 

Error statistics of shape reconstructions from 100 experiments are shown in Table [T} The 
stopping index N is chosen a priori such that (the empirical version of) the expectation 
Ell — g^||2 2 is minimal for n — N, i.e. we compare both methods with an oracle stopping 
rule. Note that the mean square error is significantly smaller for the Kullback-Leibler 
divergence than for the -L2-distance and also clearly smaller than for Pearson's distance. 
Moreover the distribution of the error is more concentrated for the Kullback-Leibler 
divergence. For Pearson's (p'^ distance it must be said that the results depend strongly 
on the cutoff parameter for the data. In our experiments c = 0.2 seemed to be a good 
choice in general. 



A phase retrieval problem. A well-known class of inverse problems with numerous 
applications in optics consists in reconstructing a function / : M"' — )• C from the modulus 
of its Fourier transform \J-'f\ and additional a priori information, or equivalently to 
reconstruct the phase Tf/\Tf\ of Ff (see Hurt [29j). 

In the following we assume more specifically that f : ^ ^ C is oi the form /(x) = 
exp(i(^(a;)) with an unknown real- valued function 99 with known compact support supp((/?). 
For a uniqueness result we refer to Klibanov |34|, although not all assumptions of this 
theorem are satisfied in the example below. It turns out to be particularly helpful if ip 
has a jump of known magnitude at the boundary of its support. We will assume that 
SUPP93 = i?p = {x G M2 : |x| < p} and that 99 XBp close to the boundary dBp (here 
XBp denotes the characteristic function of Bp). This leads to an inverse problem where 
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(a) histogram for t = 10^ (b) histo eram for t 
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0.02 0.04 

(c) histogram for t = 10'* 
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E ( maxerr„ I 
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factor 






10^ 


0.1383 


4.53 
3.18 
3.28 
3.70 






10^ 


0.0305 






10^ 


0.0096 


1 


L 


lO*' 


0.0029 






10*^ 


0.0008 



0.004 0.008 0.0010.0020.003 

(d) histogram for t = 10^ (e) histogram for t — 10* 



(f) means for different t 



Figure 2: Overview for the error terms (17b) for the inverse scattering problem. For 

different values of the expected total number of counts the value maxn<20 err„ 

has been calculated in 100 experiments. The figure shows the corresponding 

histograms and means. The decay of order i.e. reduction by a factor of 

n 

3.16 in the table is clearly visible. All parameters are as in Figure 1 
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the forward operator is given by 

F:W{Bp) ^L°°(M), 



Bn 



(70) 



is typically of the 



Here H^{Bp) denotes a Sobolev space with index s > and M C 

form M = [— K, k]^. The a priori information on ip can be incorporated in the form of an 
initial guess ipQ = l. Note that the range of F consists of analytic functions. 
The problem above occurs in optical imaging: If f{x') = exp(i99(x')) = u{x' ,d) {x' = 
(xi,X2)) denotes the values of a cartesian component u of an electric field in the plane 
{x G : X3 = 0} and u solves the Helmholtz equation Am + k'^u = and a radiation 
condition in the half-space {x E : X3 > 0}, then the intensity g{x') = |u(x', A)p of 
the electric field at a measurement plane {x E : X3 = A} in the limit A — )• 00 in the 



Fraunhofer approximation is given by I-F2/I rescaling (see e.g. Paganin 39, Sec 
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Figure 3: Median reconstructions for the phase retrieval problem with t = 10^ expected 
counts. 



1.5]). If / is generated by a plane incident wave in X3 direction passing through a 
non-absorbing, weakly scattering object of interest in the half-space {x^ < 0} close to 
the plane {x^ = 0} and if the wave length is small compared to the length scale of 
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the object, then the projection approximation f{x') ~ | j'^^{n^{x' , X3) — 1) dxj, is valid 
where n describes the refractive index of the object of interest (see e.g. [39| Sec. 2.1]). 
A priori information on ip concerning a jump at the boundary of its support can be 
obtained by placing a known transparent object before or behind the object or interest. 
The simulated test object in Figure [3] which represents two cells is taken from Gieweke- 
meyer et al. |19j. We choose the initial guess (pQ = 1, the Sobolev index s = ^, and 



the regularization parameters an = ^ ■ (2/3)". The photon density is approximated 
by J = 256^ bins. The offset parameter a is initially set to 2 • 10~^ and reduced by a 
factor I in each iteration step. As for the scattering problem, we use an oracle stopping 
rule := argmin„E||(/9„ — (^"I^Ula- As already mentioned, we had difficulties to solve the 
quadratic minimization problems (66) by the CG method for small an and had to stop 
the iterations before residuals were sufficiently small to guarantee a reliable solution. 
Nevertheless, comparing subplots (c) and (e) in Figure |3| the median KL-reconstruction 
(e) seems preferable (although more noisy) since the contours are sharper and details in 
the interior of the cells are more clearly separated. 
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